Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Voting instance selection algorithm based on learning to hash
Yajie HUANG, Junhai ZHAI, Xiang ZHOU, Yan LI
Journal of Computer Applications    2022, 42 (2): 389-394.   DOI: 10.11772/j.issn.1001-9081.2021071188
Abstract316)   HTML21)    PDF (574KB)(112)       Save

With the massive growth of data, how to store and use data has become a hot issue in academic research and industrial applications. As one of the methods to solve these problems, instance selection effectively reduces the difficulty of follow-up work by selecting representative instances from original data according to the established rules. Therefore, a voting instance selection algorithm based on learning to hash was proposed. Firstly, the Principal Component Analysis (PCA) method was used to map high-dimensional data to low-dimensional space. Secondly, the k-means algorithm was used to perform iterative operations by combining with the vector quantization method, and the hash codes of the cluster center were used to represent the data. After that, the classified data were randomly selected according to the proportion, and the final instances were selected by voting after several times independent running of the algorithm. Compared with the Compressed Nearest Neighbor (CNN) algorithm and the instance selection algorithm of linear complexity for big data named LSH-IS-F (Instance Selection algorithm by Hashing with two passes), the proposed algorithm has the compression ratio improved by an average of 19%. The idea of the proposed algorithm is simple and easy to implement, and the algorithm can control the compression ratio automatically by adjusting the parameters. Experimental results on 7 datasets show that the proposed algorithm has a great advantage compared to random hashing in terms of compression ratio and running time with similar test accuracy.

Table and Figures | Reference | Related Articles | Metrics